1 Survival Analysis : Semi-parametric model

Team Member :

Logo
Logo

Dr Omar bin Nazmi

Dr Ahmad Syahid bin Ibrahim

Dr Mohd Hilmi bin Mat Husain

2 Introduction

Survival analysis provides estimation of the association between risk factors and time-to-evet and make prediction of subject’s survival probabilities. Cox propotional hazard model or cox refression is a semiparametric model used to investigare the association between outcomes and predictors factors. unlike parametric model,in Cox PH , it holds assumption of :

  1. Hazard ratio is constant over time or in other words, the hazard for one individual is proportional to the hazard for other individuals
  2. Another assumption is the independence of survival time. The occurrence or timing of an event for one individual should not influence that of another.

This exercise will provide practical on Semiparametric model for Survival Analysis. The analysis will be using data : stroke_fatality.dta The event variable is status( dead vs censored)

3 Survival analysis : Semi-parametric model

4 Prepare environment

Load Package

library(haven)
## Warning: package 'haven' was built under R version 4.4.2
library(broom)
## Warning: package 'broom' was built under R version 4.4.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.2
## Warning: package 'ggplot2' was built under R version 4.4.3
## Warning: package 'dplyr' was built under R version 4.4.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gtsummary)
## Warning: package 'gtsummary' was built under R version 4.4.2
library(ggplot2)
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.4.2
## corrplot 0.95 loaded

Load package for survival Analysis

library(survival)
library(survminer)
## Warning: package 'survminer' was built under R version 4.4.2
## Loading required package: ggpubr
## Warning: package 'ggpubr' was built under R version 4.4.2
## 
## Attaching package: 'survminer'
## The following object is masked from 'package:survival':
## 
##     myeloma

5 Read Data

library(haven)
stroke_fatalityssss <- read_dta("stroke_fatality.dta")
glimpse(stroke_fatalityssss)
## Rows: 226
## Columns: 49
## $ sex          <dbl+lbl> 2, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 1,…
## $ race         <dbl+lbl> 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ doa          <date> 2011-01-01, 2011-01-03, 2011-01-06, 2011-01-16, 2011-01-…
## $ dod          <date> 2011-01-05, 2011-01-06, 2011-01-22, 2011-02-08, 2011-01-…
## $ time         <dbl> 4, 3, 16, 23, 5, 4, 22, 14, 4, 2, 22, 2, 43, 2, 4, 4, 3, …
## $ time2        <dbl> 4, 3, 16, 23, 5, 4, 22, 14, 4, 2, 22, 2, 43, 2, 4, 4, 3, …
## $ status2      <dbl+lbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1,…
## $ gcs          <dbl> 15, 15, 15, 11, 15, 7, 5, 13, 15, 15, 10, 15, 14, 9, 15, …
## $ sbp          <dbl> 150, 152, 231, 110, 199, 190, 145, 161, 222, 161, 149, 15…
## $ dbp          <dbl> 87, 108, 117, 79, 134, 101, 102, 96, 129, 107, 90, 61, 95…
## $ hr           <dbl> 92, 87, 64, 90, 72, 63, 102, 81, 72, 94, 59, 81, 61, 120,…
## $ dm           <dbl+lbl>  1, NA,  1,  8, NA,  2,  1,  1,  2,  2,  2,  2, NA, N…
## $ hpt          <dbl+lbl>  1,  1,  1,  1,  1,  2,  1,  1,  1,  2,  1,  2,  1,  …
## $ ckd          <dbl+lbl> NA, NA,  2,  8, NA,  2,  8,  2,  2,  2,  2,  2, NA, N…
## $ af           <dbl+lbl> NA, NA,  2,  8, NA,  2,  8,  2,  2,  2,  2,  2, NA, N…
## $ hd           <dbl+lbl> NA, NA,  2,  8, NA,  2,  8,  1,  2,  2,  2,  2, NA,  …
## $ dyslipid     <dbl+lbl> NA, NA,  1,  8, NA,  2,  8,  2,  2,  2,  2,  2, NA, N…
## $ tia          <dbl+lbl> NA, NA,  2,  8, NA,  2,  8,  2,  2,  2,  2,  2, NA, N…
## $ smoker       <dbl+lbl> NA, NA,  2,  8, NA,  2,  8,  2,  2,  1,  2,  2, NA, N…
## $ hb           <dbl> 10.4, 13.0, 11.0, 14.3, 15.7, 11.7, 13.4, 11.8, 12.6, 16.…
## $ plat         <dbl> 249, 156, 179, 233, 351, 133, 290, 251, 196, 188, 139, 30…
## $ wbc          <dbl> 12.5, 7.4, 22.4, 9.6, 18.7, 11.3, 15.8, 8.5, 9.0, 9.5, 11…
## $ na           <dbl> 138, 132, 135, 132, 138, 140, 134, 135, 129, 137, 141, 13…
## $ potas        <dbl> 3.6, 4.1, 4.7, 3.8, 3.8, 3.0, 4.1, 4.0, 4.0, 3.7, 4.1, 4.…
## $ gluc         <dbl> NA, 5.1, NA, NA, NA, NA, NA, NA, NA, 5.5, NA, NA, NA, NA,…
## $ cbs          <dbl> 11.4, NA, 18.6, 6.8, 6.5, 8.4, 13.4, 14.9, 6.6, 5.8, 7.7,…
## $ chol         <dbl> NA, 4.7, NA, NA, NA, NA, NA, NA, NA, 6.9, NA, NA, NA, NA,…
## $ tg           <dbl> NA, 1.1, NA, NA, NA, NA, NA, NA, NA, 1.0, NA, NA, NA, NA,…
## $ urea         <dbl> 5.1, 8.4, 11.0, 7.4, 3.8, 4.0, 3.3, 5.6, 5.8, 5.8, 7.3, 7…
## $ referral     <dbl> 4, 4, 4, 3, 1, 2, 4, 3, 3, 4, 1, 4, 1, 3, 4, 3, 3, 1, 4, …
## $ transport    <dbl> 2, 2, 2, 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 2, …
## $ age2         <dbl> 69, 64, 63, 67, 66, 78, 55, 65, 67, 56, 50, 78, 59, 83, 8…
## $ sex2         <dbl+lbl> 2, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 1,…
## $ marital2     <dbl+lbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 4, 4, 4, 3, 1,…
## $ status1b     <dbl+lbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1,…
## $ status2b     <dbl+lbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1,…
## $ status3b     <dbl+lbl> 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,…
## $ hpt2         <dbl+lbl>  1,  1,  1,  1,  1,  0,  1,  1,  1,  0,  1,  0,  1,  …
## $ marital3     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, …
## $ race2        <dbl+lbl> 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ sex3         <dbl+lbl> 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1,…
## $ referral2    <dbl> 3, 3, 3, 2, 0, 1, 3, 2, 2, 3, 0, 3, 0, 2, 3, 2, 2, 0, 3, …
## $ referral2cat <dbl+lbl> 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0,…
## $ icd10cat     <dbl+lbl> 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1,…
## $ icd10cat2    <dbl+lbl> 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1,…
## $ icd10cat3    <dbl+lbl> 0, 0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 1,…
## $ dm2cat       <dbl> 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, …
## $ hpt2cat      <dbl> 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, …
## $ dyslipid2cat <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
stroke.fatality <- stroke_fatalityssss %>%
  mutate_if(is.labelled, ~as_factor(.))
glimpse(stroke.fatality)
## Rows: 226
## Columns: 49
## $ sex          <fct> female, male, female, female, male, female, female, femal…
## $ race         <fct> malay, malay, malay, malay, malay, chinese, malay, malay,…
## $ doa          <date> 2011-01-01, 2011-01-03, 2011-01-06, 2011-01-16, 2011-01-…
## $ dod          <date> 2011-01-05, 2011-01-06, 2011-01-22, 2011-02-08, 2011-01-…
## $ time         <dbl> 4, 3, 16, 23, 5, 4, 22, 14, 4, 2, 22, 2, 43, 2, 4, 4, 3, …
## $ time2        <dbl> 4, 3, 16, 23, 5, 4, 22, 14, 4, 2, 22, 2, 43, 2, 4, 4, 3, …
## $ status2      <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ gcs          <dbl> 15, 15, 15, 11, 15, 7, 5, 13, 15, 15, 10, 15, 14, 9, 15, …
## $ sbp          <dbl> 150, 152, 231, 110, 199, 190, 145, 161, 222, 161, 149, 15…
## $ dbp          <dbl> 87, 108, 117, 79, 134, 101, 102, 96, 129, 107, 90, 61, 95…
## $ hr           <dbl> 92, 87, 64, 90, 72, 63, 102, 81, 72, 94, 59, 81, 61, 120,…
## $ dm           <fct> yes, NA, yes, 8, NA, 2, yes, yes, 2, 2, 2, 2, NA, NA, yes…
## $ hpt          <fct> yes, yes, yes, yes, yes, 2, yes, yes, yes, 2, yes, 2, yes…
## $ ckd          <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, NA,…
## $ af           <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, NA,…
## $ hd           <fct> NA, NA, 2, 8, NA, 2, 8, yes, 2, 2, 2, 2, NA, yes, yes, NA…
## $ dyslipid     <fct> NA, NA, yes, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, N…
## $ tia          <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, NA,…
## $ smoker       <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, yes, 2, 2, NA, NA, 8, NA, N…
## $ hb           <dbl> 10.4, 13.0, 11.0, 14.3, 15.7, 11.7, 13.4, 11.8, 12.6, 16.…
## $ plat         <dbl> 249, 156, 179, 233, 351, 133, 290, 251, 196, 188, 139, 30…
## $ wbc          <dbl> 12.5, 7.4, 22.4, 9.6, 18.7, 11.3, 15.8, 8.5, 9.0, 9.5, 11…
## $ na           <dbl> 138, 132, 135, 132, 138, 140, 134, 135, 129, 137, 141, 13…
## $ potas        <dbl> 3.6, 4.1, 4.7, 3.8, 3.8, 3.0, 4.1, 4.0, 4.0, 3.7, 4.1, 4.…
## $ gluc         <dbl> NA, 5.1, NA, NA, NA, NA, NA, NA, NA, 5.5, NA, NA, NA, NA,…
## $ cbs          <dbl> 11.4, NA, 18.6, 6.8, 6.5, 8.4, 13.4, 14.9, 6.6, 5.8, 7.7,…
## $ chol         <dbl> NA, 4.7, NA, NA, NA, NA, NA, NA, NA, 6.9, NA, NA, NA, NA,…
## $ tg           <dbl> NA, 1.1, NA, NA, NA, NA, NA, NA, NA, 1.0, NA, NA, NA, NA,…
## $ urea         <dbl> 5.1, 8.4, 11.0, 7.4, 3.8, 4.0, 3.3, 5.6, 5.8, 5.8, 7.3, 7…
## $ referral     <dbl> 4, 4, 4, 3, 1, 2, 4, 3, 3, 4, 1, 4, 1, 3, 4, 3, 3, 1, 4, …
## $ transport    <dbl> 2, 2, 2, 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 2, …
## $ age2         <dbl> 69, 64, 63, 67, 66, 78, 55, 65, 67, 56, 50, 78, 59, 83, 8…
## $ sex2         <fct> female, male, female, female, male, female, female, femal…
## $ marital2     <fct> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, yes, 4, 4, 4, 3, yes,…
## $ status1b     <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ status2b     <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ status3b     <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ hpt2         <fct> yes, yes, yes, yes, yes, no, yes, yes, yes, no, yes, no, …
## $ marital3     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, …
## $ race2        <fct> yes, yes, yes, yes, yes, no, yes, yes, yes, yes, yes, yes…
## $ sex3         <fct> female, male, female, female, male, female, female, femal…
## $ referral2    <dbl> 3, 3, 3, 2, 0, 1, 3, 2, 2, 3, 0, 3, 0, 2, 3, 2, 2, 0, 3, …
## $ referral2cat <fct> GP/Home/Missing, GP/Home/Missing, GP/Home/Missing, hospit…
## $ icd10cat     <fct> "CI,Others", "CI,Others", "Haemorrhagic", "Haemorrhagic",…
## $ icd10cat2    <fct> "CI,Others", "CI,Others", "Haemorrhagic", "Haemorrhagic",…
## $ icd10cat3    <fct> "CI,Others", "CI,Others", "ICB, Other Haemorrhage", "ICB,…
## $ dm2cat       <dbl> 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, …
## $ hpt2cat      <dbl> 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, …
## $ dyslipid2cat <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

Change format to factor ( for variable dm2cat, hpt2cat, dyslipid2cat)

library(dplyr)

stroke.fatality1 <- stroke.fatality %>%
  mutate(
    dm2cat = as_factor(dm2cat),
    hpt2cat = as_factor(hpt2cat),
    dyslipid2cat = as_factor(dyslipid2cat)
  )
str(stroke.fatality1$dm2cat)
##  Factor w/ 2 levels "0","1": 2 1 2 1 1 1 2 2 1 1 ...
levels(stroke.fatality1$dm2cat)
## [1] "0" "1"
class(stroke.fatality1$dm2cat)
## [1] "factor"
stroke.fatality1 %>% 
  tbl_summary()
Characteristic N = 2261
sex
    male 97 (43%)
    female 129 (57%)
race
    malay 211 (93%)
    chinese 11 (4.9%)
    indians 2 (0.9%)
    others 2 (0.9%)
    not recorded 0 (0%)
    missing 0 (0%)
date of admission 2011-10-29 (2011-06-10, 2012-03-12)
date of discharge 2011-11-02 (2011-06-18, 2012-03-16)
time 4 (2, 7)
time2 4 (2, 7)
status at discharge
    alive 173 (77%)
    dead 53 (23%)
earliest Glasgow Coma Scale 15.0 (10.0, 15.0)
    Unknown 1
earliest systolic BP (mmHg) 161 (143, 186)
    Unknown 1
earliest diastolic BP (mmHg) 91 (79, 104)
    Unknown 1
heart rate 80 (68, 95)
    Unknown 2
diabetes
    no 9 (5.8%)
    yes 74 (48%)
    2 44 (28%)
    8 28 (18%)
    Unknown 71
hypertension
    no 2 (1.0%)
    yes 157 (75%)
    2 37 (18%)
    8 13 (6.2%)
    Unknown 17
chronic kidney disease
    no 11 (10%)
    yes 10 (9.2%)
    2 44 (40%)
    8 44 (40%)
    Unknown 117
atrial fibrillation
    no 10 (9.7%)
    yes 5 (4.9%)
    2 44 (43%)
    8 44 (43%)
    Unknown 123
heart ds
    no 11 (8.7%)
    yes 30 (24%)
    2 46 (37%)
    8 39 (31%)
    Unknown 100
dyslipidemia
    no 10 (8.1%)
    yes 27 (22%)
    2 43 (35%)
    8 43 (35%)
    Unknown 103
transient ischaemic attack
    no 11 (11%)
    yes 0 (0%)
    2 44 (44%)
    8 46 (46%)
    Unknown 125
smoking
    no 11 (9.9%)
    yes 22 (20%)
    2 44 (40%)
    8 34 (31%)
    Unknown 115
hemoglobin 13.40 (11.60, 14.65)
    Unknown 6
platelet 235 (186, 277)
    Unknown 6
total white cell count 9.9 (7.5, 12.7)
    Unknown 6
natrium 138.0 (136.0, 140.0)
    Unknown 1
potassium 3.80 (3.50, 4.20)
    Unknown 1
glucose 5.8 (4.9, 8.8)
    Unknown 200
capillary blood sugar 8.1 (6.4, 12.1)
    Unknown 17
fasting tot chol(mmol/l) 6.05 (4.95, 6.80)
    Unknown 186
fasting TG (mmol/l) 1.30 (0.90, 1.90)
    Unknown 187
urea 6.5 (4.8, 9.0)
    Unknown 1
referral
    1 37 (17%)
    2 6 (2.8%)
    3 50 (23%)
    4 123 (57%)
    Unknown 10
transport
    1 97 (45%)
    2 119 (55%)
    Unknown 10
age in years 61 (52, 69)
sex
    male 97 (43%)
    female 129 (57%)
marital status
    no 0 (0%)
    yes 10 (4.5%)
    2 1 (0.4%)
    3 10 (4.5%)
    4 202 (91%)
    Unknown 3
status (alive, dead, etc)
    alive 173 (77%)
    dead 53 (23%)
    others 0 (0%)
    absconded 0 (0%)
status2b
    alive 173 (77%)
    dead 53 (23%)
status @discharge 1=dead, 0=alive
    alive 173 (77%)
    dead 53 (23%)
hpt2 148 (75%)
    Unknown 28
marital3 205 (91%)
race2 211 (93%)
male=1, female=0
    female 129 (57%)
    male 97 (43%)
referral2
    0 37 (17%)
    1 6 (2.8%)
    2 50 (23%)
    3 123 (57%)
    Unknown 10
0 is hospital, 1=GP or home or missing
    hospital 87 (38%)
    GP/Home/Missing 139 (62%)
0=Cerebral Isch , 1=Haemorrhagic, 2= Other
    CI,Others 145 (64%)
    Haemorrhagic 77 (34%)
    Others 4 (1.8%)
0=Cerebral Isch and others, 1=Haemorrhagic
    CI,Others 149 (66%)
    Haemorrhagic 77 (34%)
0=Cerebral Isch or others, 1=SAH, 2=Haemorrhagic
    CI,Others 149 (66%)
    SAH 19 (8.4%)
    ICB, Other Haemorrhage 58 (26%)
dm2cat
    0 152 (67%)
    1 74 (33%)
hpt2cat
    0 78 (35%)
    1 148 (65%)
dyslipid2cat
    0 199 (88%)
    1 27 (12%)
1 n (%); Median (Q1, Q3)
glimpse(stroke.fatality1)
## Rows: 226
## Columns: 49
## $ sex          <fct> female, male, female, female, male, female, female, femal…
## $ race         <fct> malay, malay, malay, malay, malay, chinese, malay, malay,…
## $ doa          <date> 2011-01-01, 2011-01-03, 2011-01-06, 2011-01-16, 2011-01-…
## $ dod          <date> 2011-01-05, 2011-01-06, 2011-01-22, 2011-02-08, 2011-01-…
## $ time         <dbl> 4, 3, 16, 23, 5, 4, 22, 14, 4, 2, 22, 2, 43, 2, 4, 4, 3, …
## $ time2        <dbl> 4, 3, 16, 23, 5, 4, 22, 14, 4, 2, 22, 2, 43, 2, 4, 4, 3, …
## $ status2      <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ gcs          <dbl> 15, 15, 15, 11, 15, 7, 5, 13, 15, 15, 10, 15, 14, 9, 15, …
## $ sbp          <dbl> 150, 152, 231, 110, 199, 190, 145, 161, 222, 161, 149, 15…
## $ dbp          <dbl> 87, 108, 117, 79, 134, 101, 102, 96, 129, 107, 90, 61, 95…
## $ hr           <dbl> 92, 87, 64, 90, 72, 63, 102, 81, 72, 94, 59, 81, 61, 120,…
## $ dm           <fct> yes, NA, yes, 8, NA, 2, yes, yes, 2, 2, 2, 2, NA, NA, yes…
## $ hpt          <fct> yes, yes, yes, yes, yes, 2, yes, yes, yes, 2, yes, 2, yes…
## $ ckd          <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, NA,…
## $ af           <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, NA,…
## $ hd           <fct> NA, NA, 2, 8, NA, 2, 8, yes, 2, 2, 2, 2, NA, yes, yes, NA…
## $ dyslipid     <fct> NA, NA, yes, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, N…
## $ tia          <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, NA,…
## $ smoker       <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, yes, 2, 2, NA, NA, 8, NA, N…
## $ hb           <dbl> 10.4, 13.0, 11.0, 14.3, 15.7, 11.7, 13.4, 11.8, 12.6, 16.…
## $ plat         <dbl> 249, 156, 179, 233, 351, 133, 290, 251, 196, 188, 139, 30…
## $ wbc          <dbl> 12.5, 7.4, 22.4, 9.6, 18.7, 11.3, 15.8, 8.5, 9.0, 9.5, 11…
## $ na           <dbl> 138, 132, 135, 132, 138, 140, 134, 135, 129, 137, 141, 13…
## $ potas        <dbl> 3.6, 4.1, 4.7, 3.8, 3.8, 3.0, 4.1, 4.0, 4.0, 3.7, 4.1, 4.…
## $ gluc         <dbl> NA, 5.1, NA, NA, NA, NA, NA, NA, NA, 5.5, NA, NA, NA, NA,…
## $ cbs          <dbl> 11.4, NA, 18.6, 6.8, 6.5, 8.4, 13.4, 14.9, 6.6, 5.8, 7.7,…
## $ chol         <dbl> NA, 4.7, NA, NA, NA, NA, NA, NA, NA, 6.9, NA, NA, NA, NA,…
## $ tg           <dbl> NA, 1.1, NA, NA, NA, NA, NA, NA, NA, 1.0, NA, NA, NA, NA,…
## $ urea         <dbl> 5.1, 8.4, 11.0, 7.4, 3.8, 4.0, 3.3, 5.6, 5.8, 5.8, 7.3, 7…
## $ referral     <dbl> 4, 4, 4, 3, 1, 2, 4, 3, 3, 4, 1, 4, 1, 3, 4, 3, 3, 1, 4, …
## $ transport    <dbl> 2, 2, 2, 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 2, …
## $ age2         <dbl> 69, 64, 63, 67, 66, 78, 55, 65, 67, 56, 50, 78, 59, 83, 8…
## $ sex2         <fct> female, male, female, female, male, female, female, femal…
## $ marital2     <fct> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, yes, 4, 4, 4, 3, yes,…
## $ status1b     <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ status2b     <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ status3b     <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ hpt2         <fct> yes, yes, yes, yes, yes, no, yes, yes, yes, no, yes, no, …
## $ marital3     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, …
## $ race2        <fct> yes, yes, yes, yes, yes, no, yes, yes, yes, yes, yes, yes…
## $ sex3         <fct> female, male, female, female, male, female, female, femal…
## $ referral2    <dbl> 3, 3, 3, 2, 0, 1, 3, 2, 2, 3, 0, 3, 0, 2, 3, 2, 2, 0, 3, …
## $ referral2cat <fct> GP/Home/Missing, GP/Home/Missing, GP/Home/Missing, hospit…
## $ icd10cat     <fct> "CI,Others", "CI,Others", "Haemorrhagic", "Haemorrhagic",…
## $ icd10cat2    <fct> "CI,Others", "CI,Others", "Haemorrhagic", "Haemorrhagic",…
## $ icd10cat3    <fct> "CI,Others", "CI,Others", "ICB, Other Haemorrhage", "ICB,…
## $ dm2cat       <fct> 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, …
## $ hpt2cat      <fct> 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, …
## $ dyslipid2cat <fct> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

6 Select variables

Variable selection

we will select importance variables for analysis. A multi-centered prospective cohort study has reported that determinants for stroke fatality involve patient-level and system-level determinants (Sarfo, Fred S et al. 2023).

selected variable :

1. important variables for stroke fatality (Sarfo, Fred S et al. 2023 )

2. Complete dataset

variable selected : 1. time : duration (days) = DOD-DOA 2. status3b : status of the patient at discharge : dead or alive 3. age2 (numerical): age in years 4. gcs (numerical ) : gcs score 5. sex (categorical): Male, female 6.icd10cat : category if diagnosis ICD10 : inschaemia(CI) and others, haemorhagic 7. dm2cat : DM status ( categorical, 0 = No, 1 = Yes) 8. hpt2cat : Hypertension status ( categorical, 0 = No, 1 = Yes ) 9. dyslipid2cat : Dsylipidemia status ( categorical, 0 = No, 1 = Yes )

Outcome event of interest( status ) : Dead

stroke1 <- stroke.fatality1 %>%
  dplyr::select(time, status3b, age2, gcs, sex, dm2cat, icd10cat2, hpt2cat, dyslipid2cat)
glimpse(stroke1)
## Rows: 226
## Columns: 9
## $ time         <dbl> 4, 3, 16, 23, 5, 4, 22, 14, 4, 2, 22, 2, 43, 2, 4, 4, 3, …
## $ status3b     <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ age2         <dbl> 69, 64, 63, 67, 66, 78, 55, 65, 67, 56, 50, 78, 59, 83, 8…
## $ gcs          <dbl> 15, 15, 15, 11, 15, 7, 5, 13, 15, 15, 10, 15, 14, 9, 15, …
## $ sex          <fct> female, male, female, female, male, female, female, femal…
## $ dm2cat       <fct> 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, …
## $ icd10cat2    <fct> "CI,Others", "CI,Others", "Haemorrhagic", "Haemorrhagic",…
## $ hpt2cat      <fct> 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, …
## $ dyslipid2cat <fct> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
summary(stroke1)
##       time         status3b        age2            gcs            sex     
##  Min.   : 0.000   alive:173   Min.   :20.00   Min.   : 3.00   male  : 97  
##  1st Qu.: 2.250   dead : 53   1st Qu.:52.00   1st Qu.:10.00   female:129  
##  Median : 4.000               Median :61.00   Median :15.00               
##  Mean   : 6.469               Mean   :60.82   Mean   :12.44               
##  3rd Qu.: 7.000               3rd Qu.:68.75   3rd Qu.:15.00               
##  Max.   :61.000               Max.   :94.00   Max.   :15.00               
##                                               NA's   :1                   
##  dm2cat         icd10cat2   hpt2cat dyslipid2cat
##  0:152   CI,Others   :149   0: 78   0:199       
##  1: 74   Haemorrhagic: 77   1:148   1: 27       
##                                                 
##                                                 
##                                                 
##                                                 
## 

Missing data in view of missing a single value of data in covariate gcs, we will do imputation using single imputation method.

stroke1$gcs[is.na(stroke1$gcs)] <- mean(stroke1$gcs, na.rm = TRUE)
summary(stroke1)
##       time         status3b        age2            gcs            sex     
##  Min.   : 0.000   alive:173   Min.   :20.00   Min.   : 3.00   male  : 97  
##  1st Qu.: 2.250   dead : 53   1st Qu.:52.00   1st Qu.:10.00   female:129  
##  Median : 4.000               Median :61.00   Median :15.00               
##  Mean   : 6.469               Mean   :60.82   Mean   :12.44               
##  3rd Qu.: 7.000               3rd Qu.:68.75   3rd Qu.:15.00               
##  Max.   :61.000               Max.   :94.00   Max.   :15.00               
##  dm2cat         icd10cat2   hpt2cat dyslipid2cat
##  0:152   CI,Others   :149   0: 78   0:199       
##  1: 74   Haemorrhagic: 77   1:148   1: 27       
##                                                 
##                                                 
##                                                 
## 
tbl_summary(stroke1, 
  by = status3b, 
  statistic = list(all_continuous() ~ "{mean} ({sd})", all_categorical() ~ "{n} ({p}%)")
) %>%
  modify_header(label = "**Variable**") %>%
  modify_spanning_header(c("stat_1", "stat_2") ~ "**Status**") %>%
  modify_caption("Summary of Data by Status")
Summary of Data by Status
Variable
Status
alive
N = 173
1
dead
N = 53
1
time 6 (7) 8 (9)
age in years 60 (14) 62 (15)
earliest Glasgow Coma Scale 13.7 (2.7) 8.5 (4.0)
sex

    male 82 (47%) 15 (28%)
    female 91 (53%) 38 (72%)
dm2cat

    0 113 (65%) 39 (74%)
    1 60 (35%) 14 (26%)
0=Cerebral Isch and others, 1=Haemorrhagic

    CI,Others 132 (76%) 17 (32%)
    Haemorrhagic 41 (24%) 36 (68%)
hpt2cat

    0 60 (35%) 18 (34%)
    1 113 (65%) 35 (66%)
dyslipid2cat

    0 147 (85%) 52 (98%)
    1 26 (15%) 1 (1.9%)
1 Mean (SD); n (%)

7 Kaplan-Meir survival estimates for overall

KM1 <- survfit(Surv(time = time, status3b == 'dead') ~ 1, 
               type = "kaplan-meier", data = stroke1)
summary(KM1)
## Call: survfit(formula = Surv(time = time, status3b == "dead") ~ 1, 
##     data = stroke1, type = "kaplan-meier")
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     0    226       3    0.987 0.00761       0.9719        1.000
##     1    221      10    0.942 0.01559       0.9120        0.973
##     2    197       4    0.923 0.01797       0.8884        0.959
##     3    169       4    0.901 0.02060       0.8616        0.942
##     4    133       4    0.874 0.02403       0.8282        0.922
##     5     92       5    0.827 0.03071       0.7685        0.889
##     6     67       3    0.789 0.03601       0.7220        0.863
##     7     58       4    0.735 0.04259       0.6561        0.823
##     9     44       2    0.702 0.04675       0.6157        0.800
##    10     38       1    0.683 0.04903       0.5935        0.786
##    12     34       4    0.603 0.05742       0.5001        0.727
##    14     25       2    0.555 0.06213       0.4452        0.691
##    18     20       1    0.527 0.06492       0.4138        0.671
##    22     16       1    0.494 0.06870       0.3761        0.649
##    25     10       2    0.395 0.08321       0.2615        0.597
##    28      6       1    0.329 0.09178       0.1907        0.569
##    29      5       1    0.263 0.09413       0.1308        0.531
##    41      3       1    0.176 0.09528       0.0606        0.509

The survival probabilites can be presented in plot .

ggsurvplot(KM1, data = stroke1, risk.table = TRUE, linetype = c(1,2), pval = TRUE)
## Warning in .pvalue(fit, data = data, method = method, pval = pval, pval.coord = pval.coord, : There are no survival curves to be compared. 
##  This is a null model.

8 Kaplan-Meier estimates for groups

  1. Sex
KM1.sex <- survfit(Surv(time = time, status3b == 'dead') ~ sex, 
                     type = "kaplan-meier", data = stroke1)
summary(KM1.sex)
## Call: survfit(formula = Surv(time = time, status3b == "dead") ~ sex, 
##     data = stroke1, type = "kaplan-meier")
## 
##                 sex=male 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1     95       4    0.958  0.0206        0.918        0.999
##     3     71       2    0.931  0.0275        0.879        0.986
##     5     32       1    0.902  0.0391        0.828        0.982
##     6     27       2    0.835  0.0581        0.729        0.957
##     7     21       1    0.795  0.0676        0.673        0.939
##     9     15       1    0.742  0.0813        0.599        0.920
##    10     13       1    0.685  0.0929        0.525        0.894
##    12     10       1    0.617  0.1059        0.440        0.863
##    14      7       1    0.529  0.1220        0.336        0.831
##    22      4       1    0.396  0.1465        0.192        0.818
## 
##                 sex=female 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     0    129       3   0.9767  0.0133       0.9511        1.000
##     1    126       6   0.9302  0.0224       0.8873        0.975
##     2    112       4   0.8970  0.0271       0.8455        0.952
##     3     98       2   0.8787  0.0295       0.8228        0.938
##     4     80       4   0.8348  0.0352       0.7685        0.907
##     5     60       4   0.7791  0.0425       0.7001        0.867
##     6     40       1   0.7596  0.0457       0.6752        0.855
##     7     37       3   0.6980  0.0541       0.5997        0.812
##     9     29       1   0.6740  0.0573       0.5705        0.796
##    12     24       3   0.5897  0.0677       0.4709        0.739
##    14     18       1   0.5570  0.0714       0.4332        0.716
##    18     15       1   0.5198  0.0757       0.3907        0.692
##    25      8       2   0.3899  0.0978       0.2385        0.637
##    28      4       1   0.2924  0.1118       0.1382        0.619
##    29      3       1   0.1949  0.1090       0.0651        0.583
##    41      2       1   0.0975  0.0879       0.0167        0.571

Among males, the survival probability falls below 0.5 between days 12 and 14, indicating an estimated median survival time of approximately 13 to 14 days. In contrast, females show a more gradual decline in survival, with the probability dropping below 0.5 between days 18 and 25, suggesting a longer median survival time of around 20 to 22 days. This indicates that female patients had better overall survival compared to males during the follow-up period.

Plot : sex

ggsurvplot(KM1.sex, data = stroke1, risk.table = TRUE, 
           linetype = c(1,2), pval = TRUE)

  1. Dm status 0 = No 1 = Yes
KM1.dm <- survfit(Surv(time = time, status3b == 'dead') ~ dm2cat, 
                     type = "kaplan-meier", data = stroke1)
summary(KM1.dm)
## Call: survfit(formula = Surv(time = time, status3b == "dead") ~ dm2cat, 
##     data = stroke1, type = "kaplan-meier")
## 
##                 dm2cat=0 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     0    152       2    0.987 0.00924        0.969        1.000
##     1    148       9    0.927 0.02124        0.886        0.969
##     2    128       4    0.898 0.02503        0.850        0.948
##     3    105       2    0.881 0.02732        0.829        0.936
##     4     85       2    0.860 0.03035        0.803        0.922
##     5     56       5    0.783 0.04287        0.704        0.872
##     6     42       3    0.727 0.05054        0.635        0.833
##     7     36       2    0.687 0.05522        0.587        0.804
##     9     29       1    0.663 0.05817        0.558        0.788
##    10     25       1    0.637 0.06160        0.527        0.770
##    12     21       4    0.515 0.07391        0.389        0.683
##    18     14       1    0.479 0.07726        0.349        0.657
##    22     10       1    0.431 0.08304        0.295        0.629
##    25      5       1    0.345 0.10174        0.193        0.615
##    29      4       1    0.258 0.10672        0.115        0.581
## 
##                 dm2cat=1 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     0     74       1    0.986  0.0134       0.9605        1.000
##     1     73       1    0.973  0.0189       0.9367        1.000
##     3     64       2    0.943  0.0280       0.8893        0.999
##     4     48       2    0.903  0.0382       0.8315        0.981
##     7     22       2    0.821  0.0653       0.7026        0.960
##     9     15       1    0.766  0.0807       0.6235        0.942
##    14      9       2    0.596  0.1234       0.3973        0.894
##    25      5       1    0.477  0.1453       0.2625        0.867
##    28      2       1    0.238  0.1836       0.0527        1.000
##    41      1       1    0.000     NaN           NA           NA
ggsurvplot(KM1.dm, data = stroke1, risk.table = TRUE, 
           linetype = c(1,2), pval = TRUE)

  1. Hpt 0 = No 1 = Yes
KM1.hpt <- survfit(Surv(time = time, status3b == 'dead') ~ hpt2cat, 
                     type = "kaplan-meier", data = stroke1)
summary(KM1.hpt)
## Call: survfit(formula = Surv(time = time, status3b == "dead") ~ hpt2cat, 
##     data = stroke1, type = "kaplan-meier")
## 
##                 hpt2cat=0 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     0     78       1    0.987  0.0127       0.9625        1.000
##     1     76       3    0.948  0.0252       0.9001        0.999
##     2     65       2    0.919  0.0318       0.8588        0.983
##     3     53       1    0.902  0.0356       0.8346        0.974
##     4     42       1    0.880  0.0407       0.8039        0.964
##     5     25       3    0.775  0.0675       0.6530        0.919
##     6     20       1    0.736  0.0744       0.6036        0.897
##     7     19       2    0.658  0.0844       0.5122        0.846
##     9     14       1    0.611  0.0905       0.4574        0.817
##    12     10       1    0.550  0.1000       0.3854        0.786
##    18      8       1    0.481  0.1086       0.3094        0.749
##    41      2       1    0.241  0.1787       0.0562        1.000
## 
##                 hpt2cat=1 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     0    148       2    0.986 0.00949       0.9681        1.000
##     1    145       7    0.939 0.01975       0.9009        0.978
##     2    132       2    0.925 0.02186       0.8828        0.968
##     3    116       3    0.901 0.02528       0.8525        0.952
##     4     91       3    0.871 0.02970       0.8147        0.931
##     5     67       2    0.845 0.03403       0.7809        0.914
##     6     47       2    0.809 0.04099       0.7326        0.894
##     7     39       2    0.768 0.04826       0.6786        0.868
##     9     30       1    0.742 0.05300       0.6451        0.854
##    10     27       1    0.715 0.05773       0.6099        0.837
##    12     24       3    0.625 0.06984       0.5023        0.778
##    14     16       2    0.547 0.08004       0.4107        0.729
##    22     12       1    0.501 0.08537       0.3592        0.700
##    25      7       2    0.358 0.10512       0.2015        0.637
##    28      3       1    0.239 0.12006       0.0891        0.640
##    29      2       1    0.119 0.10359       0.0218        0.654
ggsurvplot(KM1.hpt, data = stroke1, risk.table = TRUE, 
           linetype = c(1,2), pval = TRUE)

  1. Dyslipidemia status 0 = No 1 = Yes
KM1.dyslipid <- survfit(Surv(time = time, status3b == 'dead') ~ dyslipid2cat, 
                     type = "kaplan-meier", data = stroke1)
summary(KM1.dyslipid)
## Call: survfit(formula = Surv(time = time, status3b == "dead") ~ dyslipid2cat, 
##     data = stroke1, type = "kaplan-meier")
## 
##                 dyslipid2cat=0 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     0    199       3    0.985 0.00864       0.9681        1.000
##     1    194       9    0.939 0.01700       0.9065        0.973
##     2    176       4    0.918 0.01968       0.8801        0.957
##     3    149       4    0.893 0.02268       0.8499        0.939
##     4    119       4    0.863 0.02643       0.8129        0.917
##     5     86       5    0.813 0.03308       0.7507        0.881
##     6     64       3    0.775 0.03815       0.7036        0.853
##     7     56       4    0.720 0.04434       0.6377        0.812
##     9     43       2    0.686 0.04818       0.5979        0.787
##    10     37       1    0.668 0.05032       0.5759        0.774
##    12     33       4    0.587 0.05826       0.4829        0.713
##    14     24       2    0.538 0.06283       0.4277        0.676
##    18     20       1    0.511 0.06519       0.3978        0.656
##    22     16       1    0.479 0.06849       0.3619        0.634
##    25     10       2    0.383 0.08168       0.2523        0.582
##    28      6       1    0.319 0.08962       0.1842        0.553
##    29      5       1    0.255 0.09167       0.1264        0.516
##    41      3       1    0.170 0.09256       0.0587        0.494
## 
##                 dyslipid2cat=1 
##         time       n.risk      n.event     survival      std.err lower 95% CI 
##       1.0000      27.0000       1.0000       0.9630       0.0363       0.8943 
## upper 95% CI 
##       1.0000
ggsurvplot(KM1.dyslipid, data = stroke1, risk.table = TRUE, 
           linetype = c(1,2), pval = TRUE)

  1. Diagnosis status
KM1.icd10 <- survfit(Surv(time = time, status3b == 'dead') ~ icd10cat2, 
                     type = "kaplan-meier", data = stroke1)
summary(KM1.icd10)
## Call: survfit(formula = Surv(time = time, status3b == "dead") ~ icd10cat2, 
##     data = stroke1, type = "kaplan-meier")
## 
##                 icd10cat2=CI,Others 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1    147       3    0.980  0.0117        0.957        1.000
##     2    132       3    0.957  0.0171        0.924        0.991
##     4     78       2    0.933  0.0239        0.887        0.981
##     5     43       1    0.911  0.0317        0.851        0.975
##     6     27       1    0.877  0.0450        0.793        0.970
##     7     22       2    0.798  0.0676        0.676        0.942
##    12      9       2    0.620  0.1224        0.421        0.913
##    14      5       1    0.496  0.1480        0.277        0.890
##    28      3       1    0.331  0.1673        0.123        0.891
##    41      1       1    0.000     NaN           NA           NA
## 
##                 icd10cat2=Haemorrhagic 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     0     77       3    0.961  0.0221       0.9188        1.000
##     1     74       7    0.870  0.0383       0.7982        0.949
##     2     65       1    0.857  0.0400       0.7818        0.939
##     3     61       4    0.801  0.0462       0.7150        0.896
##     4     55       2    0.771  0.0489       0.6814        0.873
##     5     49       4    0.708  0.0541       0.6100        0.823
##     6     40       2    0.673  0.0569       0.5703        0.794
##     7     36       2    0.636  0.0596       0.5290        0.764
##     9     32       2    0.596  0.0621       0.4858        0.731
##    10     28       1    0.575  0.0634       0.4629        0.713
##    12     25       2    0.529  0.0662       0.4137        0.676
##    14     20       1    0.502  0.0679       0.3853        0.655
##    18     16       1    0.471  0.0706       0.3510        0.632
##    22     12       1    0.432  0.0748       0.3073        0.606
##    25      7       2    0.308  0.0910       0.1728        0.550
##    29      3       1    0.206  0.1036       0.0766        0.552
ggsurvplot(KM1.icd10, data = stroke1, risk.table = TRUE, 
           linetype = c(1,2), pval = TRUE)

9 Estimate survival Probabilitis

estimation of the survival probabality at that specific time of follow-up:

stroke1 %>% group_by(status3b) %>% 
  summarize(min.dur = min(time), max.dur = max(time))
## # A tibble: 2 × 3
##   status3b min.dur max.dur
##   <fct>      <dbl>   <dbl>
## 1 alive          0      61
## 2 dead           0      41
summary(KM1, times = c(20, 40, 60))
## Call: survfit(formula = Surv(time = time, status3b == "dead") ~ 1, 
##     data = stroke1, type = "kaplan-meier")
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    20     18      47    0.527  0.0649       0.4138        0.671
##    40      3       5    0.263  0.0941       0.1308        0.531
##    60      1       1    0.176  0.0953       0.0606        0.509

Comparing the survival estimates between levels of a group (categorical) variable Log Rank Test

The null hypothesis : survival estimates between levels or groups are not different.

  1. Sex
logrank.sex <- survdiff(Surv(time = time, status3b == 'dead') ~ sex, 
                        data = stroke1, rho = 0)
logrank.sex
## Call:
## survdiff(formula = Surv(time = time, status3b == "dead") ~ sex, 
##     data = stroke1, rho = 0)
## 
##              N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=male    97       15     19.7     1.126      1.89
## sex=female 129       38     33.3     0.666      1.89
## 
##  Chisq= 1.9  on 1 degrees of freedom, p= 0.2

The survival estimates between the gender group are not different( p value : 0.2)

  1. Dm status
logrank.dm <- survdiff(Surv(time = time, status3b == 'dead') ~ dm2cat, 
                        data = stroke1, rho = 0)
logrank.dm
## Call:
## survdiff(formula = Surv(time = time, status3b == "dead") ~ dm2cat, 
##     data = stroke1, rho = 0)
## 
##            N Observed Expected (O-E)^2/E (O-E)^2/V
## dm2cat=0 152       39     33.9     0.752      2.19
## dm2cat=1  74       14     19.1     1.340      2.19
## 
##  Chisq= 2.2  on 1 degrees of freedom, p= 0.1

The survival estimates between the DM status are not different( p value : 0.1)

  1. HPT
logrank.hpt <- survdiff(Surv(time = time, status3b == 'dead') ~ hpt2cat, 
                        data = stroke1, rho = 0)
logrank.hpt
## Call:
## survdiff(formula = Surv(time = time, status3b == "dead") ~ hpt2cat, 
##     data = stroke1, rho = 0)
## 
##             N Observed Expected (O-E)^2/E (O-E)^2/V
## hpt2cat=0  78       18     17.7   0.00458   0.00729
## hpt2cat=1 148       35     35.3   0.00230   0.00729
## 
##  Chisq= 0  on 1 degrees of freedom, p= 0.9

The survival estimates between the hpt status are not different( p value : 0.9)

  1. Dyslipidemia status
logrank.dyslipid <- survdiff(Surv(time = time, status3b == 'dead') ~ hpt2cat, 
                        data = stroke1, rho = 0)
logrank.dyslipid
## Call:
## survdiff(formula = Surv(time = time, status3b == "dead") ~ hpt2cat, 
##     data = stroke1, rho = 0)
## 
##             N Observed Expected (O-E)^2/E (O-E)^2/V
## hpt2cat=0  78       18     17.7   0.00458   0.00729
## hpt2cat=1 148       35     35.3   0.00230   0.00729
## 
##  Chisq= 0  on 1 degrees of freedom, p= 0.9

The survival estimates between the dyslipidemia status are not different( p value : 0.9)

  1. icd10category
logrank.icd10cat <- survdiff(Surv(time = time, status3b == 'dead') ~ icd10cat2, 
                        data = stroke1, rho = 0)
logrank.icd10cat
## Call:
## survdiff(formula = Surv(time = time, status3b == "dead") ~ icd10cat2, 
##     data = stroke1, rho = 0)
## 
##                          N Observed Expected (O-E)^2/E (O-E)^2/V
## icd10cat2=CI,Others    149       17     25.8      3.02      6.87
## icd10cat2=Haemorrhagic  77       36     27.2      2.87      6.87
## 
##  Chisq= 6.9  on 1 degrees of freedom, p= 0.009

The Survival estimates between stroke type ( CI/other and haemorhagic ) are different at the level of 5% significance (p-value = 0.009).

10 Cox propotional hazard (PH) regression

Univariable Simple Cox PH Regression

  1. For GCS ( numerical )
cox.gcs <- coxph(Surv(time = time, event = status3b == 'dead') ~ gcs, 
                 data = stroke1)
summary(cox.gcs)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~ 
##     gcs, data = stroke1)
## 
##   n= 226, number of events= 53 
## 
##         coef exp(coef) se(coef)     z Pr(>|z|)    
## gcs -0.18784   0.82875  0.03227 -5.82 5.89e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##     exp(coef) exp(-coef) lower .95 upper .95
## gcs    0.8288      1.207    0.7779    0.8829
## 
## Concordance= 0.783  (se = 0.035 )
## Likelihood ratio test= 34.14  on 1 df,   p=5e-09
## Wald test            = 33.87  on 1 df,   p=6e-09
## Score (logrank) test = 39.48  on 1 df,   p=3e-10

The simple cox PH model with covariate gcs shows that with each one unit increase in gcs, the crude log hazard for death changes by factor of -0.192. The pvalue is significant. exponentiating the log HR, the simple cox shows that with increase one unot of gcs, the crude risk for death decreases for about 17% and the decrease are between 95% CI ( 0.774, 0.8798).

  1. Age
cox.age <- coxph(Surv(time = time, event = status3b == 'dead') ~ age2, 
                 data = stroke1)
summary(cox.age)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~ 
##     age2, data = stroke1)
## 
##   n= 226, number of events= 53 
## 
##         coef exp(coef) se(coef)     z Pr(>|z|)  
## age2 0.02441   1.02471  0.01123 2.173   0.0298 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      exp(coef) exp(-coef) lower .95 upper .95
## age2     1.025     0.9759     1.002     1.048
## 
## Concordance= 0.61  (se = 0.052 )
## Likelihood ratio test= 4.82  on 1 df,   p=0.03
## Wald test            = 4.72  on 1 df,   p=0.03
## Score (logrank) test = 4.69  on 1 df,   p=0.03

The simple cox PH model with covariate age shows that with each one unit increase in age, the crude log hazard for death changes by factor of 1.025.

  1. For sex ( categorical)
cox.sex <- coxph(Surv(time = time, event = status3b == 'dead') ~ sex, 
                 data = stroke1)
summary(cox.sex)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~ 
##     sex, data = stroke1)
## 
##   n= 226, number of events= 53 
## 
##             coef exp(coef) se(coef)     z Pr(>|z|)
## sexfemale 0.4179    1.5187   0.3071 1.361    0.174
## 
##           exp(coef) exp(-coef) lower .95 upper .95
## sexfemale     1.519     0.6584     0.832     2.772
## 
## Concordance= 0.572  (se = 0.038 )
## Likelihood ratio test= 1.95  on 1 df,   p=0.2
## Wald test            = 1.85  on 1 df,   p=0.2
## Score (logrank) test = 1.88  on 1 df,   p=0.2

in simple cox PH regression , the covariote sex is not significant ( p-value 0.174)

  1. For stroke type
cox.icd10cat2 <- coxph(Surv(time = time, event = status3b == 'dead') ~ icd10cat2, 
                 data = stroke1)
summary(cox.icd10cat2)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~ 
##     icd10cat2, data = stroke1)
## 
##   n= 226, number of events= 53 
## 
##                         coef exp(coef) se(coef)     z Pr(>|z|)  
## icd10cat2Haemorrhagic 0.7922    2.2082   0.3090 2.564   0.0104 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                       exp(coef) exp(-coef) lower .95 upper .95
## icd10cat2Haemorrhagic     2.208     0.4528     1.205     4.047
## 
## Concordance= 0.65  (se = 0.041 )
## Likelihood ratio test= 6.95  on 1 df,   p=0.008
## Wald test            = 6.57  on 1 df,   p=0.01
## Score (logrank) test = 6.83  on 1 df,   p=0.009

The simple Cox Ph model woth covariate stroke type shows that patients with haemorhagic stroke has the crude log hazard for death 2.208 times compared to patients with inchaemic type/others ( pvalue = 0.014) by exponentiate the hazard log hazard, we will get hazard ratio.

tidy(cox.icd10cat2,
     exponentiate = TRUE,
     conf.int = TRUE)
## # A tibble: 1 × 7
##   term                  estimate std.error statistic p.value conf.low conf.high
##   <chr>                    <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
## 1 icd10cat2Haemorrhagic     2.21     0.309      2.56  0.0104     1.21      4.05

Patients with haemorhagic stroke shas Hazard ration of 2.208 compared to patient with inschameic stroke ( pvalue = 0.0104 and 95%CI 1.205,4.047)

  1. Hypertension
cox.hpt2cat <- coxph(Surv(time = time, event = status3b == 'dead') ~ hpt2cat, 
                 data = stroke1)
summary(cox.hpt2cat)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~ 
##     hpt2cat, data = stroke1)
## 
##   n= 226, number of events= 53 
## 
##              coef exp(coef) se(coef)      z Pr(>|z|)
## hpt2cat1 -0.02418   0.97611  0.29336 -0.082    0.934
## 
##          exp(coef) exp(-coef) lower .95 upper .95
## hpt2cat1    0.9761      1.024    0.5493     1.735
## 
## Concordance= 0.511  (se = 0.041 )
## Likelihood ratio test= 0.01  on 1 df,   p=0.9
## Wald test            = 0.01  on 1 df,   p=0.9
## Score (logrank) test = 0.01  on 1 df,   p=0.9
  1. Dyslipidemia
cox.dyslipid2cat <- coxph(Surv(time = time, event = status3b == 'dead') ~ dyslipid2cat, 
                 data = stroke1)
summary(cox.dyslipid2cat)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~ 
##     dyslipid2cat, data = stroke1)
## 
##   n= 226, number of events= 53 
## 
##                  coef exp(coef) se(coef)      z Pr(>|z|)
## dyslipid2cat1 -1.4174    0.2424   1.0137 -1.398    0.162
## 
##               exp(coef) exp(-coef) lower .95 upper .95
## dyslipid2cat1    0.2424      4.126   0.03323     1.767
## 
## Concordance= 0.534  (se = 0.02 )
## Likelihood ratio test= 3.17  on 1 df,   p=0.08
## Wald test            = 1.95  on 1 df,   p=0.2
## Score (logrank) test = 2.3  on 1 df,   p=0.1
  1. Diabetes
cox.dm2cat <- coxph(Surv(time = time, event = status3b == 'dead') ~ dm2cat, 
                 data = stroke1)
summary(cox.dm2cat)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~ 
##     dm2cat, data = stroke1)
## 
##   n= 226, number of events= 53 
## 
##            coef exp(coef) se(coef)      z Pr(>|z|)
## dm2cat1 -0.4618    0.6302   0.3127 -1.477     0.14
## 
##         exp(coef) exp(-coef) lower .95 upper .95
## dm2cat1    0.6302      1.587    0.3414     1.163
## 
## Concordance= 0.574  (se = 0.035 )
## Likelihood ratio test= 2.33  on 1 df,   p=0.1
## Wald test            = 2.18  on 1 df,   p=0.1
## Score (logrank) test = 2.22  on 1 df,   p=0.1

for hypertension status, DM status , and dyslipidemia status , the covariates are not significant.

11 Multivariable Cox PH

  1. Main Effect Model
cox.mv <- coxph(Surv(time = time, event = status3b == 'dead') ~  gcs  +
                age2 + icd10cat2, data = stroke1)
summary(cox.mv)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~ 
##     gcs + age2 + icd10cat2, data = stroke1)
## 
##   n= 226, number of events= 53 
## 
##                           coef exp(coef) se(coef)      z Pr(>|z|)    
## gcs                   -0.18328   0.83253  0.03364 -5.449 5.07e-08 ***
## age2                   0.03169   1.03220  0.01120  2.829  0.00468 ** 
## icd10cat2Haemorrhagic  0.47526   1.60843  0.31408  1.513  0.13024    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                       exp(coef) exp(-coef) lower .95 upper .95
## gcs                      0.8325     1.2012    0.7794    0.8893
## age2                     1.0322     0.9688    1.0098    1.0551
## icd10cat2Haemorrhagic    1.6084     0.6217    0.8691    2.9768
## 
## Concordance= 0.817  (se = 0.03 )
## Likelihood ratio test= 43.55  on 3 df,   p=2e-09
## Wald test            = 41.91  on 3 df,   p=4e-09
## Score (logrank) test = 48.65  on 3 df,   p=2e-10
  1. Model with Interaction

Numerical and Numerical

cox.gcs.age <- coxph(Surv(time = time, event = status3b == 'dead') ~  gcs  +
                age2 + icd10cat2 + gcs:age2, data = stroke1)
summary(cox.gcs.age)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~ 
##     gcs + age2 + icd10cat2 + gcs:age2, data = stroke1)
## 
##   n= 226, number of events= 53 
## 
##                            coef exp(coef)  se(coef)      z Pr(>|z|)  
## gcs                   -0.366160  0.693392  0.180317 -2.031   0.0423 *
## age2                   0.006961  1.006986  0.026258  0.265   0.7909  
## icd10cat2Haemorrhagic  0.491785  1.635233  0.320404  1.535   0.1248  
## gcs:age2               0.002946  1.002950  0.002839  1.038   0.2995  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                       exp(coef) exp(-coef) lower .95 upper .95
## gcs                      0.6934     1.4422    0.4870    0.9873
## age2                     1.0070     0.9931    0.9565    1.0602
## icd10cat2Haemorrhagic    1.6352     0.6115    0.8727    3.0641
## gcs:age2                 1.0030     0.9971    0.9974    1.0085
## 
## Concordance= 0.814  (se = 0.03 )
## Likelihood ratio test= 44.62  on 4 df,   p=5e-09
## Wald test            = 38.96  on 4 df,   p=7e-08
## Score (logrank) test = 49.75  on 4 df,   p=4e-10

Numerical and Categorical

cox.gcs.icd10cat2 <- coxph(Surv(time = time, event = status3b == 'dead') ~  gcs  +
                age2 + icd10cat2 + gcs:icd10cat2, data = stroke1)
summary(cox.gcs.icd10cat2)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~ 
##     gcs + age2 + icd10cat2 + gcs:icd10cat2, data = stroke1)
## 
##   n= 226, number of events= 53 
## 
##                               coef exp(coef) se(coef)      z Pr(>|z|)   
## gcs                       -0.15645   0.85518  0.05424 -2.884  0.00392 **
## age2                       0.03216   1.03268  0.01118  2.876  0.00403 **
## icd10cat2Haemorrhagic      0.88815   2.43064  0.72681  1.222  0.22172   
## gcs:icd10cat2Haemorrhagic -0.04503   0.95597  0.07018 -0.642  0.52112   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                           exp(coef) exp(-coef) lower .95 upper .95
## gcs                          0.8552     1.1694    0.7689    0.9511
## age2                         1.0327     0.9683    1.0103    1.0556
## icd10cat2Haemorrhagic        2.4306     0.4114    0.5849   10.1015
## gcs:icd10cat2Haemorrhagic    0.9560     1.0461    0.8331    1.0969
## 
## Concordance= 0.819  (se = 0.029 )
## Likelihood ratio test= 43.97  on 4 df,   p=7e-09
## Wald test            = 43.19  on 4 df,   p=9e-09
## Score (logrank) test = 50.58  on 4 df,   p=3e-10

12 Model Comparison

anova(cox.mv, cox.gcs.age)
## Analysis of Deviance Table
##  Cox model: response is  Surv(time = time, event = status3b == "dead")
##  Model 1: ~ gcs + age2 + icd10cat2
##  Model 2: ~ gcs + age2 + icd10cat2 + gcs:age2
##    loglik  Chisq Df Pr(>|Chi|)
## 1 -206.82                     
## 2 -206.29 1.0698  1      0.301
anova(cox.mv, cox.gcs.icd10cat2)
## Analysis of Deviance Table
##  Cox model: response is  Surv(time = time, event = status3b == "dead")
##  Model 1: ~ gcs + age2 + icd10cat2
##  Model 2: ~ gcs + age2 + icd10cat2 + gcs:icd10cat2
##    loglik  Chisq Df Pr(>|Chi|)
## 1 -206.82                     
## 2 -206.61 0.4189  1     0.5175

In model comparison , we compare between model with interaction term. We observe the same result ( p value >0.05). We decide not to add interaction term in the model (parsimonous model)

13 Model Checking Plotting Kaplan-Meier

  1. Linearity in hazard assumption For numerical

Age2 and gcs

ggcoxfunctional(Surv(time, status3b == "dead") ~ age2 + gcs, data = stroke1)
## Warning: arguments formula is deprecated; will be removed in the next version;
## please use fit instead.

Linearity assummed

14 Propotional Hazard Assumption

The main assumption in Cox PH regression is that the estimated hazard is proportional across the follow-up time.

  1. KM Method
prop.h.km <- cox.zph(cox.mv, transform = 'km', global = TRUE)
prop.h.km
##            chisq df     p
## gcs       0.0337  1 0.854
## age2      0.8474  1 0.357
## icd10cat2 3.2547  1 0.071
## GLOBAL    4.2053  3 0.240
plot(prop.h.km)

  1. The Rank Method
prop.h.rank <- cox.zph(cox.mv, transform = 'rank')
prop.h.rank
##           chisq df     p
## gcs       2.345  1 0.126
## age2      0.435  1 0.510
## icd10cat2 2.927  1 0.087
## GLOBAL    5.132  3 0.162
plot(prop.h.rank)

15 Model Checking

final model : cox.mv

  1. Residuals

We can use residuals to assess for model fitness. They are useful to check for overall model fitness or for individual subjects fitness. The residuals can indicate the presence of outliers or influential subjects in our model.

residuals() can be calculated to produce martingale, deviance, score or Schoenfeld residuals for a Cox proportional hazards model.

1.1 Score Residuals

score.cox <- resid(cox.mv, type = "score")
head(score.cox)
##          gcs       age2 icd10cat2Haemorrhagic
## 1  5.7824817  6.3640609           -0.58119140
## 2 -0.2464467  0.0410133            0.02432698
## 3 -1.9912752 -0.4886707           -0.07679850
## 4 -2.4152637 -4.5979977           -0.19893897
## 5 -0.7692424 -0.3278447           -0.04856296
## 6 -0.8077002  8.5290419            0.16910526

1.2 Martingale residuals

marti.cox <- resid(cox.mv, type = "martingale")
head(marti.cox)
##           1           2           3           4           5           6 
##  0.94290458 -0.04140021 -0.28417583 -0.77708991 -0.12505775  0.47075406

1.3 Schoenfeld residuals

schoen.cox <- resid(cox.mv, type = "schoenfeld")
head(schoen.cox)
##         gcs       age2 icd10cat2Haemorrhagic
## 0 -4.591998  -8.317883             0.3859729
## 0  2.408002 -11.317883             0.3859729
## 0 -5.591998 -18.317883             0.3859729
## 1  2.062987   2.147824             0.4120769
## 1 -5.937013  -3.852176            -0.5879231
## 1 -5.937013  16.147824             0.4120769

1.4 Scaled Schoenfeld residuals

sschoen.cox <- resid(cox.mv, type = "scaledsch")
head(sschoen.cox)
##            gcs          age2 icd10cat2Haemorrhagic
## 0 -0.397092922 -0.0079317873              1.710203
## 0  0.027059479 -0.0380863223              2.536873
## 0 -0.442499182 -0.0730107090              1.348059
## 1 -0.009896065  0.0526213599              2.944268
## 1 -0.609036105  0.0009486449             -3.449363
## 1 -0.510036647  0.1574209009              2.246893

1.5 dfbeta

dfbeta.cox <- resid(cox.mv, type = "dfbeta")
head(dfbeta.cox)
##            [,1]          [,2]         [,3]
## 1  0.0049629628  3.834251e-04 -0.040543099
## 2 -0.0002211700  2.266179e-05  0.001822086
## 3 -0.0024253417 -4.054244e-05 -0.012605425
## 4 -0.0030874383 -5.987001e-04 -0.027493473
## 5 -0.0009787822 -4.146322e-05 -0.006794891
## 6 -0.0007394547  1.167581e-03  0.018494869
  1. Residual Plot Plot to identify the outliers using score residuals
plot(stroke1$gcs, score.cox[,2], ylab="Score residuals")

plot(stroke1$age2, score.cox[,1], ylab="Score residuals")

Plot to identify the outliers using martingale residuals

plot(stroke1$age2, marti.cox, ylab = "Martingale residuals for age")

plot(marti.cox, type = 'h', main = "Martingale residuals", ylab = "dfbetas")

Or , we use the augment() function to do similar tasks as above. The resulting datasets consists of - the fitted variable

pred.cox.mv <- augment(cox.mv, data = stroke1)
pred.cox.mv
## # A tibble: 226 × 12
##     time status3b  age2   gcs sex    dm2cat icd10cat2    hpt2cat dyslipid2cat
##    <dbl> <fct>    <dbl> <dbl> <fct>  <fct>  <fct>        <fct>   <fct>       
##  1     4 dead        69    15 female 1      CI,Others    1       0           
##  2     3 alive       64    15 male   0      CI,Others    1       0           
##  3    16 alive       63    15 female 1      Haemorrhagic 1       1           
##  4    23 alive       67    11 female 0      Haemorrhagic 1       0           
##  5     5 alive       66    15 male   0      Haemorrhagic 1       0           
##  6     4 dead        78     7 female 0      Haemorrhagic 0       0           
##  7    22 alive       55     5 female 1      Haemorrhagic 1       0           
##  8    14 dead        65    13 female 1      CI,Others    1       0           
##  9     4 alive       67    15 male   0      CI,Others    1       0           
## 10     2 alive       56    15 male   0      CI,Others    0       0           
## # ℹ 216 more rows
## # ℹ 3 more variables: .fitted <dbl>, .se.fit <dbl>, .resid <dbl>

16 Prediction

From the Cox PH , we can predict 1. The linear predictor 2. The risk 3. The expected number of events given the covariates and follow up time

We make a new data and name them as newdata using expand.grid() function:

our model

summary(cox.mv)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~ 
##     gcs + age2 + icd10cat2, data = stroke1)
## 
##   n= 226, number of events= 53 
## 
##                           coef exp(coef) se(coef)      z Pr(>|z|)    
## gcs                   -0.18328   0.83253  0.03364 -5.449 5.07e-08 ***
## age2                   0.03169   1.03220  0.01120  2.829  0.00468 ** 
## icd10cat2Haemorrhagic  0.47526   1.60843  0.31408  1.513  0.13024    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                       exp(coef) exp(-coef) lower .95 upper .95
## gcs                      0.8325     1.2012    0.7794    0.8893
## age2                     1.0322     0.9688    1.0098    1.0551
## icd10cat2Haemorrhagic    1.6084     0.6217    0.8691    2.9768
## 
## Concordance= 0.817  (se = 0.03 )
## Likelihood ratio test= 43.55  on 3 df,   p=2e-09
## Wald test            = 41.91  on 3 df,   p=4e-09
## Score (logrank) test = 48.65  on 3 df,   p=2e-10
tidy(cox.mv)
## # A tibble: 3 × 5
##   term                  estimate std.error statistic      p.value
##   <chr>                    <dbl>     <dbl>     <dbl>        <dbl>
## 1 gcs                    -0.183     0.0336     -5.45 0.0000000507
## 2 age2                    0.0317    0.0112      2.83 0.00468     
## 3 icd10cat2Haemorrhagic   0.475     0.314       1.51 0.130
stroke1 %>% select(gcs, age2, icd10cat2) %>% summary()
##       gcs             age2              icd10cat2  
##  Min.   : 3.00   Min.   :20.00   CI,Others   :149  
##  1st Qu.:10.00   1st Qu.:52.00   Haemorrhagic: 77  
##  Median :15.00   Median :61.00                     
##  Mean   :12.44   Mean   :60.82                     
##  3rd Qu.:15.00   3rd Qu.:68.75                     
##  Max.   :15.00   Max.   :94.00
new_data <- expand.grid(gcs = c(5, 10, 12),
                        age2 = c(40, 50, 60),
                        icd10cat2 = c('CI,Others', 'Haemorrhagic'))
                       
new_data
##    gcs age2    icd10cat2
## 1    5   40    CI,Others
## 2   10   40    CI,Others
## 3   12   40    CI,Others
## 4    5   50    CI,Others
## 5   10   50    CI,Others
## 6   12   50    CI,Others
## 7    5   60    CI,Others
## 8   10   60    CI,Others
## 9   12   60    CI,Others
## 10   5   40 Haemorrhagic
## 11  10   40 Haemorrhagic
## 12  12   40 Haemorrhagic
## 13   5   50 Haemorrhagic
## 14  10   50 Haemorrhagic
## 15  12   50 Haemorrhagic
## 16   5   60 Haemorrhagic
## 17  10   60 Haemorrhagic
## 18  12   60 Haemorrhagic

17 Linear Predictor

model : cox.mv

predict(cox.mv, newdata = new_data, type = 'lp')
##           1           2           3           4           5           6 
##  0.70287990 -0.21352530 -0.58008739  1.01979686  0.10339165 -0.26317043 
##           7           8           9          10          11          12 
##  1.33671381  0.42030861  0.05374653  1.17813573  0.26173053 -0.10483155 
##          13          14          15          16          17          18 
##  1.49505269  0.57864749  0.21208541  1.81196965  0.89556445  0.52900236
augment(cox.mv, newdata = new_data)
## # A tibble: 18 × 5
##      gcs  age2 icd10cat2    .fitted .se.fit
##    <dbl> <dbl> <fct>          <dbl>   <dbl>
##  1     5    40 CI,Others     0.703   0.329 
##  2    10    40 CI,Others    -0.214   0.242 
##  3    12    40 CI,Others    -0.580   0.233 
##  4     5    50 CI,Others     1.02    0.270 
##  5    10    50 CI,Others     0.103   0.141 
##  6    12    50 CI,Others    -0.263   0.121 
##  7     5    60 CI,Others     1.34    0.250 
##  8    10    60 CI,Others     0.420   0.0818
##  9    12    60 CI,Others     0.0537  0.0167
## 10     5    40 Haemorrhagic  1.18    0.391 
## 11    10    40 Haemorrhagic  0.262   0.356 
## 12    12    40 Haemorrhagic -0.105   0.364 
## 13     5    50 Haemorrhagic  1.50    0.355 
## 14    10    50 Haemorrhagic  0.579   0.312 
## 15    12    50 Haemorrhagic  0.212   0.319 
## 16     5    60 Haemorrhagic  1.81    0.353 
## 17    10    60 Haemorrhagic  0.896   0.305 
## 18    12    60 Haemorrhagic  0.529   0.310

18 Risk score

predict(cox.mv, newdata = new_data, type = 'risk')
##         1         2         3         4         5         6         7         8 
## 2.0195605 0.8077317 0.5598494 2.7726315 1.1089256 0.7686109 3.8065140 1.5224313 
##         9        10        11        12        13        14        15        16 
## 1.0552171 3.2483128 1.2991764 0.9004762 4.4595715 1.7836244 1.2362535 6.1224947 
##        17        18 
## 2.4487176 1.6972382

19 The expected number of events for a given follow-up time

new_data2 <- expand.grid(status3b = 'dead', time = c(20, 40, 50))
new_data2
##   status3b time
## 1     dead   20
## 2     dead   40
## 3     dead   50

Combine new_data and new_data2

new_data3 <- data.frame(new_data, new_data2)
head(new_data3)
##   gcs age2 icd10cat2 status3b time
## 1   5   40 CI,Others     dead   20
## 2  10   40 CI,Others     dead   40
## 3  12   40 CI,Others     dead   50
## 4   5   50 CI,Others     dead   20
## 5  10   50 CI,Others     dead   40
## 6  12   50 CI,Others     dead   50

the predicted number of events are

pred.exp <- predict(cox.mv, newdata = new_data3, type = 'expected')
cbind(new_data3, pred.exp)
##    gcs age2    icd10cat2 status3b time  pred.exp
## 1    5   40    CI,Others     dead   20 0.5690970
## 2   10   40    CI,Others     dead   40 0.4934196
## 3   12   40    CI,Others     dead   50 0.5291371
## 4    5   50    CI,Others     dead   20 0.7813067
## 5   10   50    CI,Others     dead   40 0.6774101
## 6   12   50    CI,Others     dead   50 0.7264463
## 7    5   60    CI,Others     dead   20 1.0726470
## 8   10   60    CI,Others     dead   40 0.9300086
## 9   12   60    CI,Others     dead   50 0.9973298
## 10   5   40 Haemorrhagic     dead   20 0.9153501
## 11  10   40 Haemorrhagic     dead   40 0.7936287
## 12  12   40 Haemorrhagic     dead   50 0.8510777
## 13   5   50 Haemorrhagic     dead   20 1.2566737
## 14  10   50 Haemorrhagic     dead   40 1.0895638
## 15  12   50 Haemorrhagic     dead   50 1.1684348
## 16   5   60 Haemorrhagic     dead   20 1.7252730
## 17  10   60 Haemorrhagic     dead   40 1.4958497
## 18  12   60 Haemorrhagic     dead   50 1.6041308

Conclusion final table for multivariable cox PH

tbl_regression(cox.mv, exponentiate = TRUE)
Characteristic HR1 95% CI1 p-value
earliest Glasgow Coma Scale 0.83 0.78, 0.89 <0.001
age in years 1.03 1.01, 1.06 0.005
0=Cerebral Isch and others, 1=Haemorrhagic


    CI,Others
    Haemorrhagic 1.61 0.87, 2.98 0.13
1 HR = Hazard Ratio, CI = Confidence Interval

Interpretation :

  1. Every 1 unit increase in gcs, the patients has 17% lower risk of dying , adjusted to age and gcs ( pvalue<0.001, 95% CI 0.78, 0.89)

  2. Every 1-year increase in age is expected to increase the hazard of dying by 1.03 times (pvalue 0.005, 95% CI 1.01, 1.06), adjusted to gcs and stroke type

  3. The simple Cox Ph model with covariate stroke type shows that patients with haemorhagic stroke has the crude log hazard for death 2.208 times compared to patients with inchaemic type/others ( pvalue = 0.014) by exponentiate the hazard log hazard, we will get hazard ratio. However, when being fitted to final model, covariate stroke type loses its significance.